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ABSTRACT 

We introduce a framework for describing the halo selection function of optical cluster finders. We 
treat the problem as being separable into a term that describes the intrinsic galaxy content of a halo 
(the Halo Occupation Distribution, or HOD) and a term that captures the effects of projection and 
selection by the particular cluster finding algorithm. Using mock galaxy catalogs tuned to reproduce 
the luminosity dependent correlation function and the empirical color-density relation measured in the 
SDSS, we characterize the maxBCG algorithm applied by Kocstcr et al. to the SDSS galaxy catalog. 
We define and calibrate measures of completeness and purity for this algorithm, and demonstrate 
successful recovery of the underlying cosmology and HOD when applied to the mock catalogs. We 
identify principal components — combinations of cosmology and HOD parameters — that are recov- 
ered by survey counts as a function of richness, and demonstrate that percent- level accuracies are 
possible in the first two components, if the selection function can be understood to ~ 15% accuracy. 
Subject headings: cosmology: theory — cosmological parameters — galaxies: clusters — galaxies: 
halos — methods: statistical 
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1. INTRODUCTION 

It has long been known that the abundance of massive 
halos in the universe is a powerful c osmological probe. 
From theoretical considerations ([Press fc Schechterl 
[1971: iBond et allHOflll : iSheth & Tormed I2OO20 one ex- 
pects the number of massive clusters in the universe 
to be exponentially sensitive to the amplitude of the 
matter power spectrum erg, a picture that has been 
confirmed with extensive numerical simulations (e.g. 
Jenkins et a"ni2001l: ISheth fc Tormenll2002l : IWarren et all 



20051) .^ Moreover, since the number of halos also depends 



on the mean matter density of the universe, cluster abun- 
dance constraints typically result in degeneracies of the 
form CTs^^m ~ constant where 7 ~ 0.5. This type of 
constraint is usually r eferred to as a cl uster normaliza- 
tion condition(sec e.g. iRozo et"ani2004l . for a discussion 
of the origin of this degeneracy). 

There are, however, important difficulties one must 
face in determining erg from any given cluster sample. 
Specifically, the fact that cluster masses cannot be di- 
rectly observed implies that some other observable such 
as X-ray emission or galaxy ovcrdcnsity must be relied 
upon both to detect halos and estimate their masses. 
Consequently, characterizing how what one sees, the clus- 
ter population, is related to what we can predict, the 
halo population, is of fundamental importance. In fact, 
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it is precisely these types of systematic uncertainties that 
dominate the error budget in current cosm ological con- 
straints from c luster abundances (see e.g. iSeliakI 120021 : 
iPierpaofi et"an i2003: Hcnrv 2004). 

Optical surveys are traditionally thought of as be- 
ing particularly susceptible to these types of systemat- 
ics, a belief that is largely historical in origin. The 
earliest cluster catalogs available were cre ated through 
visual identi f icatiq n of galaxy clu s ters (I Abel 



Zwickv et al.l [19681 IShectmanI [1981 [AbelT et all 



1958 



1989 



Gunn et al.lll986l ). and thus cluster selection was inher- 
ently not quantifiable. While the situation was much 
improved by the introduction of automated cluster- 
finding algorithms ( [Shectmm et al. 199j; 

iDalton et all ll997tTGaret all l2000l l'2003l ). projection 
effects — the identification of spurious concentra- 
tions of galaxies along the line of sight as physical 
groupings — remained a significant obstacle (see e.g. 
Lucev 198 3: Katgcrt et al. 1996 : Postman et al. 1999 
van Haarlem et all Il997t lOke et all Il998t ). These pro- 



jection effects can be minimized by turning to spec- 
troscopic surveys, though even then difficulties arise 
due t o the fing er-of-god elongation along the line of 
sight (iHuchra fc GcUcr 1 982; Nolthonius fc White 198j; 
Moore et al.l 119931: iRamella et al. 1989: Kochanek et all 
20031: iMerchan fc Zandivarej 12002; ,Eke et all l2004l: 



Yang et all l2005bl: iMiller et all 120051 : iBerfind eTaLl 
2006[) . Alternatively, several new optical cluster- 



finding algorithms have been developed that take ad- 
vantage of the accurate photometry available in large 
digital sky surv eys such as the Sloan Digital Sky 
Survey (SDSS, lYork et all [200l to la rgely, though 
not completely, overcome th is difficu lty (Kepner et alj 
1991 TCladders fc Ycc 2000 jWhite~fc Kochanek 200i 



Goto et al.ll2002l : iKim et al.l[2?M [Koester et al.ll2007al) 



The challenge that confronts optical cluster work to- 
day is to demonstrate that these type of selection effects 
can be, if not entirely overcome, then at least properly 
taken into account within the context of parameter esti- 
mation in cosmological studies. In this work, we intro- 
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ducc such a scheme. The key idea behind our analysis is 
to define the cluster selection function P{Nobs\m) as the 
probability that a halo of mass m be detected as clus- 
ter with Nobs galaxies, and then make the fundamental 
assumption that cluster detection is a two-step process: 
first, there is a probability P{Nt\'m) that a mass m halo 
will contain Nt galaxies, and second, there is a matrix 
PiNobslNf) which describes the probability that a halo 
with Nf galaxies will be detected as a cluster with N^bs 
galaxies. In other words, we are assuming that cluster 
detection depends on mass primarily through the num- 
ber of galaxies hosted by the cluster's halo. Note that 
the probability P{Nobs\Nt) characterizes not only mea- 
surement errors but any possible systematic errors such 
as line of sight projections. A key advantage of defining 
the problem in this way is that it leads one naturally to 
precise definitions of purity and completeness for a given 
cluster sample, and allows for proper marginalization of 
our results over all major systematic uncertainties. 

The formalism outlined here could be generally ap- 
plied to any optically-identified cluster survey, but we 
focus herein on its application to the SDSS maxBCG 
catalog. In particular, we test the method by populat- 
in g dark matter simula tions with galaxies as described 
in lWechsler et all (|2007t ) and then running the maxB CG 
cluster finding algorithm of IKoester et al.l ()2007af ) in 
the resulting mock galaxy catalog. Note that the re- 
sulting cluster catalog will suffer all of the major sys- 
tematics affecting th e corresponding data catalog from 
IKoester et al.l (|2007b[ ) , including incompleteness (non de- 
tections), impurities (false detections), and systematic 
biasing of galaxy membership in clusters from galaxies 
projected along the line of sight. By comparing the un- 
derlying halo population to the resulting cluster catalog 
we characterize the maxBCG cluster selection function in 
the mock catalogs. Using a maximum likelihood analy- 
sis, we then demonstrate that when the cluster selection 
function is known at a quantitative level, we can suc- 
cessfully recover the cosmological and HOD parameters 
of each of the mocks to within the intrinsic degeneracies 
of the data. We emphasize that these results explicitly 
demonstrate that our analysis correctly takes into ac- 
count the systematic uncertainties inherent to the data. 

The layout of the paper is as follows. In ^we describe 
our model, including our parameterization of the vari- 
ous systematic uncertainties that affect real cluster sam- 
ples. The quantitative calibration of the cluster selection 
function for the max BCG cluster finding algorithm of 
IKoester et al.l ()2007al) through the use of numerical sim- 
ulations is detailed in fJH In 21 investigate whether 
our model accurately describes the cluster selection func- 
tion, and in particular, whether we can recover the cos- 
mological parameters of mock cluster samples using the 
techniques developed in this paper. We summarize in 21 

2. THE MODEL 

This section describes our general framework in detail. 
We begin by considering a perfect cluster finding algo- 
rithm, and slowly add the various layers of complexity 
that arise in the real world. We first allow observational 
scatter in cluster richnesses, and demonstrate that this 
naturally gives rise to the concepts of purity and com- 
pleteness. We then include the effects of photometric 
redshift uncertainties, finally discussing how a careful 



calibration of these various difficulties can be included 
within a maximum likelihood analysis of an observational 
data set. 

2.1. The Basic Picture 

The basic tenet of our model is that galaxy clusters 
are associated with massive halos. Consider then a clus- 
ter sample where Nobs is used to denote the number of 
observed galaxies within the cluster. We refer to Nobs 
as the cluster's richness. If P{Nobs\'m) is the probabil- 
ity that a mass m halo has Nobs galaxies, and there are 
{d (n) / dm)dm such halos, the number of clusters with 
Nobs galaxies is simply 

{n{Nobs)) = j dm ^P{Nobs\m) (1) 
If one were to bin the data such that bin a ~ 

[Rrmn, Rmax) COntainS aU clusters with Rmax > Nobs > 

Rmin, one need only sum the above expression over the 
relevant values of Nobs- As we shall see momentarily, it 
is useful to define a binning function tpa{Nobs) such that 
ipaiNobs) = 1 if Nobs falls in bin a and zero otherwise. 
With this definition, the number of clusters in bin a can 
be re-expressed as 

(na) = Jdm ^(^a|m). (2) 

where (ipal'Ti) contains the sum over all Nobs and is de- 
fined as 

= J2PiNobs\m)i,aiNobs)- (3) 

Nabs 

Proper modeling of an observational sample reduces to 
understanding the probability distribution P{Nobs\'m')- 

We consider first the case of a perfect cluster-finding 
algorithm: assume the algorithm detects all halos, there 
are no false detections, and that the observed number of 
galaxies Nobs is equal to the true number of halo galaxies 
Nt for every halo. The probability P{Nt\m) is called the 
Halo Occupation Distribution (HOD), and characterizes 
the int rinsic scatter in the r ichness-r nass relation . Fol- 
lowing iKraytsov et all ()2004 see also lZheng et al.ll2005l : 
lYang et al.ll2005a^ we assume that the total number of 
galaxies in a halo takes the form A''^ = 1 -I- Ngat where 
Nsat, the number of satellite galaxies in the cluster, is 
Poisson distributed at each m with an expectation value 
{Nt\m) given by 

(^.at|m)=(^^y. (4) 

Here, Mi is the characteristic mass at which halos ac- 
quire one satellite galaxy. Note that in cluster abundance 
studies, the typical mass scale probed is considerably 
larger than Mi. Nevertheless, the above parametrization 
is convenient since degeneracies between HOD and cos- 
mological parameters take on partic ularly simple form s 
when parameterized in this way fsee lRozo et al.|[2004[ ). 

2.2. Noise in Galaxy Membership Assignments 

In general, the observationally-dctermined number of 
galaxies Nobs in a cluster may differ from the true number 
of galaxies Nt in the corresponding halo. That is to say. 
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we expect there is a probability distribution P{Nobs\Nt) 
that gives us the probabihty that a halo with Nt galaxies 
will be detected as a cluster with Nobs galaxies. Before we 
look in more detail at the probability matrix P{Nobs\Nt), 
we investigate how the above assumption affects the fi- 
nal expression for cluster abundances. Equations [2] and 
[3] are, of course, unchanged, though the probability ma- 
trix P(iVo6s|m) is no longer identical to the halo occu- 
pation distribution P{Nt\m). Rather, it is related to the 
HOD via P{Nobs\m) = J^n, PiNobs\Nt)P{Nt\m). Con- 
scqucntly, the quantity (■(/'a|m) becomes 



{^pa\m) 



Nt 



i^a{Nt)P{Nt\m) 



(5) 



where ipaiNt) is defined as 

i^a{Nt) = MNobs)P{Nobs\Nt). (6) 

Nob. 

Since equation [5] has the same form as equation [3l as 
long as P(Nobs\Nt) is known we can view observational 
errors simply as a re-binning of the data. 

2.3. Completeness 

In general, we expect non-zero matrix elements in the 
matrix P{Nobs\Nt) to arise in one of two ways: 

1. The cluster-finding algorithm worked correctly: it 
detected a cluster where there is a halo, and the 
assigned richness Nobs is close to its expected value 

(NobslNt). 

2. The cluster-finding algorithm worked incorrectly: 
it cither failed to detect a cluster where there was a 
halo {Nobs = 0, Nt 0), detected a cluster where 
there were no halos {Nobs 7^ 0, Nt = 0), or the 
richness estimate was grossly incorrect. 

Imagine marking now every non-zero matrix element 
of the matrix P{Nobs\Nt) with a point on the Nobs — Nt 
plane (see Figure [5] for an example). In general, we 
expect points for which the cluster-finding algorithm 
worked correctly to populate a band around the expec- 
tation value {Nobs\Nt) , which we refer to as the signal 
band. Points falling outside the signal band we refer to 
as noise, and represent those instances where the cluster- 
finding algorithm suffered a catastrophic error. Generi- 
cally, we expect that the values of the probability matrix 
P{Nobs\Nt) within the signal band will be stable and easy 
to characterize, whereas the noise part of the matrix will 
be unstable and difficult to characterize. Our challenge is 
then to come up with a reasonable way to account for the 
noise part of the probability matrix in cluster abundance 
studies. 

Let us begin our attack on this problem with some 
definitions. We define the quantity c{Nt) as the proba- 
bility that a halo with Nt galaxies be correctly detected. 
Thus, c{Nt) is simply the sum of all matrix elements 
P{Nobs\Nt) within the signal band at fixed Nt. Note 
since c{Nt) is the probability of a halo being detected as 
signal, the expectation value for the fraction of signal ha- 
los is precisely c{Nt). We thus refer to c{Nt) as the com- 
pleteness function. We emphasize, however, that c{Nt) 



is fundamentally a probability, and consequently it con- 
tributes to the correlation matrix of the observed cluster 
counts. We also define the signal matrix Ps{Nobs\Nt) via 



Ps{Nobs\Nt) = P{Nobs\Nt)/c{Nt) 



(7) 



for matrix elements within the signal band, and 
Ps{Nobs\Nt) = otherwise. In other words, the signal 
matrix is what the probability matrix would be if there 
was no noise (catastrophic errors) in the data. Finally, 
we define the noise matrix Pn{Nobs\Nt) to be zero within 
the signal band, and equal to P{Nobs\Nt) otherwise. Wc 
can thus write 

P{Nobs\Nt) = c{Nt)Ps{Nobs\Nt) + Pn{Nobs\Nt). (8) 

Inserting this expression into equations [2] and (H we see 
that the total abundance is a sum of a signal term and a 
noise term, 

(na) = {na)s + {na)„- (9) 

If one is willing to drop the information contained within 
the noise term, then characterizing P{Nobs\Nt) in the 
noise regime become unnecessary. All one needs to do 
instead is characterize the completeness c{Nt) and the 
noise contribution {na)^- 

2.4. Purity 

Wc take a probabilistic approach for characterizing the 
noise contribution {na)^ the cluster density. Specifi- 
cally, we define the purity function p{Nobs) as the prob- 
ability that a cluster with Nobs galaxies be signal. Con- 
sider then a fixed richness Nobs, and let N be the num- 
ber of observed clusters and Ns be the number of signal 
clusters. The expectation value for Ns given N is thus 
{Ns\N) = Np where p is the purity. Note, however, that 
this is not the quantity we are interested in. For model- 
ing purposes, we are interested in the number of observed 
clusters N given the predicted number of signal clusters 
Ns. That is, we need to compute {N\Ns). To compute 
this number, we first note that the probability distribu- 
tion P{Ns\N) is a simple binomial distribution 



P{Ns\N) 



N 
Ns 



p''^{l-p) 



N-N. 



(10) 



In the limit iV ^ 1, wc can approximate the bino- 
mial distribution as a Gaussian with expectation value 
{Ns\N) = Np and variance Ya,r{Ns\N) = Np{l - p). 
By Bayes's theorem, P{N\Ns) is simply proportional to 
P{Ns\N), so we find 



P{N\Ns) 



A 



y^27TNp{l~p) 



cxp 



{N - Ns/pf 
'2N{l~p)/p 



where A is a normalization constant. Wc can further sim- 
plify this expression in the limit Ns 3> 1 and for cases 
where the purity is close to unity. In this limit, the prob- 
ability distribution p{Ns\N) becomes very narrow, and 
the expectation value for the random variable x defined 
via Ns{l + x) — N will be close to zero. Expanding 
the above expression around x ^ Q and keeping only the 
leading order terms we obtain 



p{x\Ns) = Aexp - 



{x - ^if 



2o-2 



(12) 
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where fi — [1 — p)/p and — fi/Ng. Note that since 
(7^ <C we can extend the range of x to vary from — oo to 
+00, in which case p{x\Ns) becomes a simple Gaussian. 
It foUows that the expectation value {N\Ns) is given by 
{N\Ns) = Ns/p, exactly what we would expect. 

At this point it might seem that the above argument 
was really an unnecessary complication in that our end 
result is exactly what we would have naively guessed. 
Nevertheless, our argument is useful in that it proves 
that this guess is indeed correct. Most importantly, hav- 
ing identified purity as a probability allows us to compute 
the statistical uncertainty associated with the purity of 
the sample. Of course, in general there is also an addi- 
tional associated systematic uncertainty when one does 
not know the purity function with infinite precision. 

Turning our attention back to the expectation value 
for the number density of clusters in a given bin, the re- 
lation {N\Ns) = Ns/p implies that the expected number 
density of clusters in a given richness bin is given by 



= / dm {cil:a/p\m) 
am 



(13) 



where 



{cipa/p\m) 



J2 PiNt\m)PsiNobs\Nt) 



c{Nt)MNobs) 

p{Nobs) 

(14) 

and the sum extends over all Nt and Nobs values. 

Before we end, we would like to reiterate our main 
point: the importance of the above algebraic juggling is 
that, provided one is willing to part with the information 
contained within the noise part of the probability matrix, 
we can model observed cluster abundances using only the 
signal matrix, the completeness function, and the purity 
function. Moreover, not only have we proved that we 
do not need to know the details of the tails of the full 
probability matrix P{Nobs\Nt), we have shown that not 
knowing these tails naturally gives rise to the concepts 
of both completeness and purity. 

2.5. Photometric Redshift Uncertainties 

An additional complication that we need to consider 
in our analysis is the effect of photometric redshift es- 
timation for the various clusters. Here, we make the 
simple assumption that photometric redshift estimates 
can be characterized through a probability distribution 
p{zc\zh)dzc, where denotes the true halo redshift and 
Zc denotes the photometrically estimated cluster redshift. 
In general, we expect the probability p{zc\zh) will depend 
on the cluster richness Nobs since the number of galaxies 
contributing to the photo-z estimate for the cluster in- 
creases with Nobs- On the other hand, systematic errors 
can mitigate the sensitivity to cluster richness. The bot- 
tom line is that when applying our method to real data, 
it is important to check whether the assumption that 
p{zc\zh) is richness independent or not is a valid one. 
Generalizing to richness-dependent photometric redshift 
errors is not particularly difficult. We simply chose not to 
consider this case in the interest of simplicity. Moreover, 
we will see later that neglecting these dependences do 
not result in noticeable errors in parameter estimation. 

Consider then the expression for the total number of 
clusters in a given richness bin and within some pho- 
tometric redshift range [zmim Zmax]- We already know 



the comoving number density of halos at redshift Zh is 
given by equation [21 To get the number of clusters at 
an observed redshift Zc, we first multiply by the comov- 
ing volume {dV/dzh)dzfi — Ax^(dx/ dzh)dzh to get the 
total number of clusters at redshift z/i, and then multi- 
ply by the probability p{zc\zh)dzc that the clusters are 
observed within some redshift range dzc- In the above 
expressions, A is the area of the survey and x is the co- 
moving distance to redshift Zh- Summing over all halo 
redshifts, and over the photometric redshift range con- 
sidered Zc G [zmin, Zmax], the total number of clusters Na 
in a given bin is 



{Na) = / dZh 



dzo{na{zh)) ^p{zc\zh)- (15) 

We will find convenient in the future to rewrite the 
above expression in terms of a redshift selection function 
(p{zc), defined to be unity if Zc € [zmin, Zmax] and zero 
otherwise. In terms of tp, the above expression becomes 

/JT r 
dzh {na{zh)) — {^\zh) (16) 
dzji 

where 



{v\zh) 



dzc p{zc\zh)ip{zc). 



(17) 



The reason this recasting is useful is that in this lan- 
guage it becomes obvious that the relation between Zc 
and Zh is the same as that between Nobs and Nt- This 
then implies that when we set out to compute the like- 
lihood function for the observed number of clusters Na, 
the same algebra that describes uncertainties due to rich- 
ness estimation will describe uncertainties due to redshift 
estimation, allowing us to quickly derive the relevant ex- 
pressions for one if we know the other. 

2.6. The Likelihood Function 

So far we have only concerned ourselves with devel- 
oping a model for the expectation value of the cluster 
number density. In order to use this model as a tool for 
extracting cosmological parameters from observed clus- 
ter samples, we now attack the problem of modeling the 
likelihood of observing a particular richness function. In 
this work, we chose to model the probability of observing 
a realization given a set of cosmological and HOD param- 
eters as a Gaussian. While more accur ate likelihood func- 
tions can be found in the literature ()Hu fc CohnI 120061 : 
IHoldeil[200l . these ignore correlations due to scatter in 
the mass-observable relation, and thus we have opted for 
a simple Gaussian model which is expected to hold if bins 
are sufficiently wide (i.e. contain > 10 clusters). All we 
need to do now is to compute the various elements of the 
correlation matrix {SfiaSna')- Moreover, since our goal is 
to perform a maximum-likelihood analysis, we calculate 
not the correlation between cluster densities, but rather 
the correlation matrix for the actual observed number of 
clusters in a given bin. We denote the number of clusters 
in richness bin a as Na, and the correlation matrix as C, 

so that Ca,a' = {SNaSNa') whcrC SNa =Na- (Na). 

The correlation matrix element Ca^a' has six distinct 
contributions: 

1 . A Poisson contribution due to the Poisson fluctua- 
tion in the number of halos of mass m within any 
given volume. 
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2. A sample variance contribution reflecting the fact 
that the survey volume may be slightly overdense 
or underdense with respect to the universe at large. 

3. A binning error arising from the stochasticity of 
Nobs BiS a function of m and the probabilistic nature 
of the completeness function. 

4. A contribution due to the statistical uncertainties 
associated with photometric redshift estimation. 

5. A contribution due to the stochastic nature of the 
purity function. 

In principle, there is an additional pointing error for 
those cases in which the central cluster galaxy is misidcn- 
tified. This error is typically negligible except for small 
area surveys, or surveys with highly irregular window 
functions, so we have opted to ignore this effect. We will 
see below that this did not affect our parameter estima- 
tion in our mock catalogs at any noticeable level. The 
derivation of each of these contributions to the correla- 
tion matrix is so mewhat lengthy, so we shal l simply re- 
fer th e reader to iHu fc Kravtsovl l)2003f) and iRozo et all 
(|2004f ). which detail the general procedure for deriving 
the relevant correlation matrices. The final expression for 
the various matrix elements can be found in Appendix [A1 
Here, we simply take for granted that we can compute 
correlation matrix Ca,a'- Given the correlation matrix 
elements, and assuming flat priors for all of the relevant 
model parameters, the likelihood function becomes 



/:(n,p|N) = A 



exp{~i(N-(N)).C-i-(N-(N))} 



v/(27r)'^^det(C) 



where ^ is a normalization constant, M is the number 
of bins, N = {A^i, 7V2, Nm} is the data vector, H 
is the set of parameters describing cosmology and the 
HOD, and p is the set of nuisance parameters charac- 
terizing the purity and completeness functions, and the 
parameters describing the signal matrix Ps{Nobs\Nt). In 
general, calibration of the cluster-finding algorithm with 
simulations should allow one to place strong priors on the 
distribution of the nuisance parameters, in which case 
the above likelihood function is simply multiplied by the 
corresponding a priori (simulation-calibrated) probabil- 
ity distribution p{p). 

3. SELECTION FUNCTION CALIBRATION FOR THE 
MAXBCG ALGORITHM 

In the previous section, we developed a general frame- 
work with which one may quantitatively characterize 
the cluster selection function of any cluster finding al- 
gorithm. We now proceed to calibrate this selection 
function for the maxB CG cluster finding algorithm from 
iKoester et al.l ([20073) • The idea behind the calibration 
is simple: we use an empirically driven algorithm to pop- 
ulate dark matter simulations with galaxies, resulting in 
realistic galaxy catalogs comparable to the SDSS data. 
We then run the maxBCG cluster-finding algorithm on 
each of our mock galaxy catalogs, and compare the re- 
sulting mock cluster catalogs to the original input halo 
catalog of the simulation to directly measure the matrix 
elements P{Nobs\Nt). In what follows, we only briefly 
describe both the mock catalogs and the cluster flnding 



algorithm, as the relevant details can be found elsewhere. 
Rather, we focus on the key aspects of the analysis that 
are particular to the general framework developed earlier 
in section [2l 

3.1. The Simulations 

The N-body simulation based mock ca talogs we use in 
our ca libration are described in detail in lWechsler et all 
()2007f ). Briefly, galaxies arc attached to dark matter 
particles i n the Hubble Volum e light-cone simulation de- 
scribed in lEvrard et al.l ()2002[ ) using an observationally- 
motivated biasing scheme. The relation between dark 
matter particles of a given overdensity (on a mass scale 
of ^ lelSM©) is related to the correlation function of the 
particles; these particles are then chosen to reproduce 
the luminosity-depend ent correlatio n func tion as mea- 
sured in the SDSS bv IZehavi et"an (|2005D . The num- 
ber of galaxies of a given brightness placed within the 
simulations is determined by draw hig galaxies from th e 
SDSS galaxy luminosity function (jBlanton et al.l [2003( 1 . 
Finally, colors are assigned to each galaxy by measuring 
their local galaxy density, and assigning to them the col- 
ors of a real SDSS galaxy with similar lum inosity and lo- 
cal density (see also lTasitsiomi et "ani2004[ ). This method 
produces mock galaxy catalogs that reproduce several 
properties of the observed SDSS galaxies, and in par- 
ticular follow the empirical color-galaxy density relation 
and its evolution, which is is particularly important for 
ridgeline based cluster detection methods. In this work, 
we use three different realizations of this general popula- 
tions scheme. The resulting catalogs are labeled Mocks 
A, B, and C. Each of these catalogs has different HOD, 
which allows us test the robustness of the selection func- 
tions to varying cosmologies. 

3.2. The maxBCG Cluster- Finding Algorithm 

Details of how the m axBCG cluster - finding algorithm 
works can be found in IKoester et al.l lj2007a[ ). Briefiy, 
maxBCG assumes that the Brightest Cluster Galaxy 
(BCG) in every cluster resides at the center of the clus- 
ter. These BCG galaxies are found to have a very tight 
color-magnitude relation, which is used to select candi- 
date BCGs, and to evaluate the likelihood Cbcg that 
these candidates are indeed true BCGs. In addition, 
maxBCG uses the fact that all known clusters have a 
so-called ridgeline population of galaxies, bright early- 
type galaxies that populate a narrow ridgeline in color- 
magnitude space. Using a model for the radial and color 
distribution of ridgeline galaxies in clusters, the likeli- 
hood Cr that the galaxy population around the candi- 
date BCGs is due to a cluster being present is computed. 
These likelihoods are maximized as a function of red- 
shift, which provides a photometric redshift estimate for 
the cluster. The candidate BCGs are then rank ordered 
according to the total hkelihood C = CbCbcg- The 
top candidate BCG is selected as a cluster BCG, and its 
satellite galaxies are removed from the candidate BCGs. 
In the algorithm, all ridgeline galaxies within a specified 
scaled radius i?2oo of the clust er are conside r ed sate l- 
lite galaxies (details are given in IKoester et al.l ([20073)). 
The process is then iterated until a final cluster catalog 
is obtained. 

3.3. Matching Halos to Clusters 
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Fig. 1. — The cost function A(Naba) defined in equation 1191 
The function compares the observed abundance to that predicted 
using the matching between observed and halo-based richness, ig- 
noring purity and completeness. The solid line is obtained using 
exclusive maximum shared membership matching to compute the 
probability matrix estimate P{N^iji,\Nt). The dashed line is ob- 
tained with exclusive BCG matching, while the thin dotted line 
is obtained using non-exclusive maximum membership matching. 
Other non-exclusive matchings, including the probabilistic algo- 
rithms discussed in the text, look quite similar to the dotted line. 
The upturn at high richness for the one to one matchings are due to 
catastrophic errors in the cluster-finding algorithm (i.e. noise term 
in the P{Noi,g\Nt) matrix) and are unphysical. We chose exclusive 
maximum shared membership matching as our fiducial halo-cluster 
matching algorithm. 

Given a halo catalog and the corresponding mock clus- 
ter catalog, estimating the matrix element P{Nobs\Nt) 
becomes a simple matter of measuring the fraction of 
halos with Nf galaxies detected as clusters with Nohs 
galaxies. Of course, in order to compute this fraction, 
one needs first to define Nt and Nobs , and then one needs 
to know how to find the correct cluster match for individ- 
ual halos in the halo catalog. Concerning the first point, 
and in the interest of having a volume limited catalog 
to z = 0.3, we define Nt as the number of galaxies in a 
halo (i.e. within r2oo, where r2oo is the radius at which a 
the mean density of the cluster is 200 times the critical 
density of the universe) above an i-band luminosity of 
0.4L*. Note that no color cut is applied in the definition 
of Nt. As mentioned earlier, we take Nobs to be simpl y 
the Nl^lf° richness estimate from lKoester et al.l (|2007aD . 
Note that in general, the probability matrix P{Nobs\NT:) 
will depend on precisely how one defines galaxy member- 
ship for both halos and clusters. In this work, we focus 
exclusively on the above definitions, and leave the prob- 
lem of whether our result can be improved upon by a 
redefinition of halo and cluster richness for future work. 
The above definitions arc intuitively reasonable ones, and 
thus provide a good starting point for our analysis. 

We now turn to the problem of matching halos to clus- 
ters. In general, there is no unique way of matching ha- 
los to clusters and vice-versa. For instance, a halo could 
be matched to the cluster that is found nearest to it. 
to the halo that contains the clusters central galaxy, or 
to the cluster that contains the largest fraction of that 
halo's galaxy members. Note that since different match- 
ing schemes will result in different probability matrices, 
one needs to consider multiple schemes and determine 
which is most correct. We define a matching algorithm 



Fig. 2. — The estimated probability matrix P{Nabs\^t) in Mock 
A. Non-zero matrix elements are marked with diamonds. The best- 
fit power law to the maximum-likelihood relation between N^bs a^nd 
Nt is shown above as the thick solid line. The dashed curves define 
the signal band: everything within these lines is considered signal 
in the sense that it corresponds to proper halo-cluster matches. 
Points outside this band, including the points on each axis, are 
considered noise in the sense that they represent catastrophic errors 
of the cluster-finding algorithm (see text for how the dashed lines 
are defined); these points will contribute to the incompleteness and 
the impurity of the sample. Note that blending, that is, matching 
of low richness halos to high richness clusters, is clearly more of 
a problem than halo splitting (m atching of high richnes s halos to 
low richness cluster) , as argued in IKoester et al.l (I2007al) . We only 
show Nt > 10 as this corresponds to the resolution limit of the 
simulation. 

to be optimal if it minimizes the cost function 

\n{Nobs)-Emn{Nt)P{Nobs\Nt)\ 

MNobs) = ^T^^ . 19 

n{Nobs) 

Note that if matching was perfect, we would expect 
A = 0. The cost function is closely related to the pu- 
rity and completeness function. In particular, the cost 
function is a measure of how strongly the observed clus- 
ter sample deviates from having completeness and purity 
exactly equal to unity in the absence of pointing and pho- 
tometric redshift errors. As detailed in Appendix [Bl we 
find that the matching algorithm that worked best was 
one in which the richest halo is matched to the cluster 
with which it shares the largest number of galaxies. The 
halo and cluster are then removed from the halo and clus- 
ter catalogs respectively, and the procedure is iterated. 
Figure [T] demonstrates that the resulting cost function is 
below 20% at all richnesses when using this particular 
matching scheme, which immediately tells us that the 
purity and completeness of the sample are better than 
80%. 

3.4. Signal and Noise 

The observed fraction P{Nobs\Nt) of halos with Nt 
galaxies that arc matched to clusters with Nobs galax- 
ies is our estimator for the matrix element P{Nobs\Nt).^ 
Figure [2] shows the non-zero matrix elements of the esti- 
mated probability matrix. We can sec that this plot has 

* When performing the matching, it is important to keep in mind 
that the halo catalog should have a slightly smaller area and red- 
shift range than the corresponding cluster catalog. This is because 
due to pointing and photometric redshift errors a halo located near 
the boundary of the survey could be well matched by a cluster just 
outside said boundary. 
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the generic behavior we expected from ^ the major- 
ity of the non-zero matrix elements populate a diagonal 
band, with a few outliers which arise due to catastrophic 
errors in the cluster finding algorithm. We split the ma- 
trix P{Noiis\Nt) into a signal and a noise component as 
follows: first, we find the maximum-likelihood relation 
between Nobs and Nt, that is, we select the matrix ele- 
ments that maximize P{Nobs\Nt) at fixed Nt, restricting 
ourselves to the region Nt > 10 as this corresponds to the 
resolution limit of the simulations (ie, some halos which 
would contribute to lower Nt could be missed). The 
maximum- likelihood matrix elements are then fit with a 
line niNt) using robust linear fitting (jPress et al.lll992f ). 
The best-fit line is characterized by the two parameters 
Bq and (3 defined via 



miUNt) = 20exp{Bo){Nt/20f. 



(20) 



We now make the ansatz that the variance will, at 
least roughly, scale with this maximum likelihood rela- 
tion, which would be the case for a Poisson-like process. 
The signal band is then defined as all matrix elements 
(Nt, Nobs) such that 



Nobs > f^ML{Nt) - 5y'fiML{Nt), and (21) 
Nobs < IJiML {Nt) + lQ^j:^^^^dNt) (22) 

Non-signal halo-cluster pairs are defined to be noise. 
The signal/noise decomposition for Mock A can be seen 
in Figure [21 any matrix elements contained within 
the dashed lines are signal, and everything outside the 
dashed lines constitutes noise. The solid line going 
through the signal band is our best fit of the maximum- 
likelihood relation between Nobs and Nt- Plots for the 
probability matrix for the other two mocks are qualita- 
tively very similar. 

While the above procedure is ad-hoc, we emphasize 
that we are defining the signal band. At the end of the 
day, it does not matter how we came up with the above 
definition, what matters is whether the definition is a 
useful one or not. For our purposes, we have simply 
selected a straightforward algorithm that qualitatively 
does what we need it to do, that is, cleanly separate 
signal points from catastrophic errors. Other algorithms 
would certainly be possible and equally valid, and there 
will undoubtedly be a definition that works best in the 
sense that that the statistical errors from a maximum 
likelihood analysis using the likelihood from 311 would 
be minimized. In this work, we simply wish to adopt a 
working definition, and we demonstrate below that even 
with this simplest definition where signal and noise are 
separated by eye, our model likelihood correctly describes 
the data and we can successfully recover the cosmological 
and bias parameters with exquisite precision. 

3.5. Completeness 

Having separated our halo-cluster pairs into a signal 
and a noise component, our estimator c{Nt) for the com- 
pleteness function is simply the fraction of halos with 
Nt that are considered signal. In order to reduce the 
noise in these estimators, we also bin our halos into rich- 
ness bins by demanding that each richness bin contain 
at least 50 halos. The resulting estimated completeness 
function in Mock A is shown in Figure [31 as the solid cir- 
cles with error bars. Also shown in Figure |3l as triangles 
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Fig. 3. — The completeness function c{Nt) as measured in Mock 
A. Filled solid circles with error bars are the observed fraction of 
halos of richness A^t matched within the signal band in Figure |2] 
For comparison, we also show the fraction of halos matched to 
clusters of any richness as triangles. The best fit completeness, 
modeled as constant, is shown as a solid line, and the grey regions 
represents the 95% confidence interval in our completeness deter- 
mination. Dashed and solid lines are obtained for Mocks B and C 
respectively, which arc fully consistent with each other. 

is the fraction of halos matched to cluster of any rich- 
ness, regardless of whether the match constitutes signal 
or noise. We can see that for relatively rich systems with 
Nt ^25, essentially all halos are detected, but complete- 
ness differs from unity due to some of these halos being 
blended. Conversely, at low richness, the vast majority 
of detected halos constitute signal, but the detected frac- 
tion decreases with decreasing richness, and as a result, 
the completeness function is essentially flat. We found 
this to be the case in each of the mocks. 

We model the completeness function as a constant in- 
dependent of richness. Our best-fit model for the com- 
pleteness function is defined via minimization, and is 
seen in Figure |3l as a thick, solid line. Best fits for Mocks 
B and C are also shown as dashed and dotted lines respec- 
tively. Error bars for the minimization are assigned 
using the fact that the number of signal halos follows a 
binomial distribution with a detection probability c{Nt), 
from which we can compute the expected standard de- 
viation of the ratio of signal halos to all halos in a given 
richness bin. 

In order to determine whether our fit is a good fit, 
and to estimate the uncertainty in the best-fit complete- 
ness, we performed 10^ Monte Carlo realizations of our 
best-fit completeness model, and then treated these real- 
izations in the same way we treated our data. We found 
the x^ values measured in the mock to be consistent with 
our Monte Carlo x^ distribution. The 95% confidence in- 
terval for the completeness function in Mock A is shown 
in Figure 131 as a grey band. Each of the mocks have com- 
pleteness measures that are consistent with each other. 

3.6. Calibrating the Signal Matrix 

Calibration of the signal matrix Ps{Nobs\Nt) is essen- 
tially a trial and error game since we have no a priori 
expectation for the form of Ps{Nobs\Nt). Note, however, 
that so long as we parameterize Ps{Nobs\Nt) in a way 
that is statistically consistent with the simulation con- 
straints, it does not matter how we came up with our 
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Fig. 4. — distribution of 10* Monte Carlo realizations of our 
best-fit model of the signal matrix Psi^obsl^t) for Mock A. The 
value observed in the mock is marked by a thick, solid line at 
the bottom of the plot. We find that our model for Ps{Afo6sl^t) is 
indeed a good fit to the mock catalog data. This was the case for 
Mocks B and C as well. 

particular parameterization. Rather than discussing the 
various iterations we went through to find a successful 
model for Ps{Noiis\Nt), we have chosen to simply state 
our model, and then demonstrate that our parameteri- 
zation is flexible enough to fully accommodate the sim- 
ulation data. Our model for the probability matrix is 

Ps{Nobs\Nt) oc eri{x max 

) - erf(a;mi„) (23) 

where a;™„ = {Nobs - f^jNt) + 1)/ ^2Var(iVt), x^ax = 
[Nobs - M(iVt) + 2)/v/2Var(iVt), and 

n{Nt) = 20exp(Bo + 0.14)(7Vt/20)(^-°-i2)(24) 
Var(iVob.|iVt) -exp(-3Bo + Bi)ti{Nt). (25) 

The factor of 20 is simply our chosen pivot point. Note 
that Bq and /3 were defined earlier in equation [20l so 
that the only new parameter being introduced is _Bi, 
which characterizes the variance of Ps{Nobs\Nt)- The 
appearance of — 3i3o in the expression for Var(A'^t) de- 
correlates Bq and Bi, whereas the additive constants 0.14 
and —0.12 in the expressions for /i(iVt) were empirically 
determined and characterize the difference between the 
mean value {Nobs\Nt) = fJ-iNt) and the maximum likeli- 
hood value fJ.ML{Nt) from equation [20l Finally, the pro- 
portionality constant in equation 1231 is set by demanding 
that the sum of all matrix elements over the signal band 
be equal to unity. 

We now demonstrate that this parameterization does 
indeed provide a good fit to the mock catalogs and esti- 
mate the uncertainties in our best fit parameters. To do 
so, we first find the best-fit value for Bi by minimizing 
and assuming a binomial distribution for computing 
the error bars for each matrix clement. We then com- 
pare the distributions obtained from 10^ Monte Carlo 
realizations of our best-fit model for each of our mocks 
to the value observed in the mocks directly. Figure 
m illustrates our result in the case of Mock A. It is clear 
from the figure that the model is indeed a good fit to the 
simulation data. This is true of Mocks B and C as well.^ 

^ An exact comparison between our Monte Carlo realizations 
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Fig. 5. — 95% confidence regions for the Bq and /3 parameters 
(top) and Bq and Bi (bottom) in each of our three mock catalogs. 
The dashed and solid contours are for Mocks A and B respectively. 
The shaded contours are 68% and 95% confidence regions in Mock 
A. The small filled circles mark the best-fit parameters from the 
mock catalogs, and were used to generate the Monte-Carlo real- 
izations from which the confidence regions are derived. With the 
exception of /3, the best-fit parameters in each of the mocks are 
clearly not consistent with each other, and represent a large sys- 
tematic error. The solid line in the top panel corresponds to the 
value fS = 1.18, while the dashed lines mark the assumed la error 
A/3//3 = 5% in this work. The dotted lines in the top panel and 
the soli d lines in the bot tom panel mark the assumed la region 
used in lRozo et al.l 1 120071) . 

The top plot in Figure [5] shows the 95% confidence re- 
gions of the parameters Bq and /3 in Mocks A, B, and 
C. The corresponding regions for the parameters Bq and 
Bi are shown in the bottom panel. It is evident that the 
best-fit parameters Bq and Bi in each mock arc not fully 
consistent with each other, and that this variation rep- 
resents a large systematic uncertainty in the probability 
matrix Ps{Nobs\Nt)- Nevertheless, the slope f3 appears to 
be robustly constrained, with roughly /? = 1.18±5% (Icr). 

and the mocks suffers from the fact that while the mocks suf- 
fer from completeness being different from unity, whereas our 
Monte Carlo models of the probability matrix do not. Since 
Pa (A^oijs I A^i) = P{Nai,s\^t)/c{Nt), there is somewhat of an am- 
biguity as to whether in comparing the two concerning whether we 
should inflate the error estimates of the Monte Carlo realizations 
by a factor of 1/c as we do for the simulation data. Fortunately, 
since the completeness is close to unity, this ambiguity does not 
alter the distributions much (we checked this explicitly). 
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Fig. 6. — Fraction of clusters matched to a halo within the signal 
band. Filled circles are the fraction measured in Mock A. The thick 
solid line is the best fit to the data, and the grey band represents 
the 95% confidence band of the model. Dotted and dashed lines 
are the best fits for Mocks B and C respectively. For reference, we 
also show with triangles the fraction of clusters matched to halos 
of any richness. 

Early analysis of some recent simulations suggests that 
the scatter in f3 is in fact larger than this, and the mean 
is somewhat lower, around (3 « 1.1, though a complete 
study of these simulations has not yet been completed. 
In what follows, we simply assume f3 = 1.18 ± 5% unless 
noted otherwise, though we note we use a much more 
conservative prior P = 1.18 ± 15% when analyzing the 
maxBCG cluster catalog constructed from SDSS data. 

3.7. Purity 

The purity function represents the fraction of clusters 
that are not well matched to a halo (i.e. that fall outside 
the signal band). Calibration of the purity function is 
thus completely analogous to the completeness function 
provided we switch the role of halos and clusters. That 
is, thinking of clusters as input and halos as output, we 
follow the exact same procedure we used to define the 
completeness function in order to define the purity func- 
tion. Our estimate for of the purity function in Mock A 
is shown in Figure [H Also shown as a solid line is our 
best-fit model, which we have chosen to parameterize as 

p{Nobs) = exp(-x(7Vo6.)') (26) 

where 

The factor of ln(15) is there simply to de-correlate po 
and pi . The best-fit models for Mocks B and C are also 
shown as dashed and dotted lines respectively. Finally, 
as with completeness, we generate 10'^ Monte Carlo re- 
alizations to estimate the 95% confidence regions of our 
best-fit parameters, and to test whether the model is a 
statistically acceptable fit to the data. We find that this 
is indeed the case, and the corresponding 95% confidence 
band for Mock A is shown in Figure [5] as a grey band. 
The three mock catalogs are only marginally consistent 
with each other, but the purity is quite high in each of 
them. 
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Fig. 7. — Distribution of the photometric redshift bias param- 
eter b = Zc/zi^ for Mock A. The distribution Pi,{b) is richness and 
redshift independent to a good approximation, and is well fit by 
a Gaussian, as shown above. The thick solid line at the bottom 
represent the best-fit value for the average bias. 

3.8. Photometric Errors Calibration 

We now calibrate the probability distribution p{zc\zh), 
that is, the distribution of photometric cluster redshift 
estimates in terms of the true halo cluster redshift. 
We characterize the photometric redshift distribution in 
terms of the redshift bias parameter b = Zc/zh- The 
probability distribution pb{b\zh) is related to the proba- 
bility distribution p{zc\zfi) via 

p{zc\zh,) = —pb{b\zh). (28) 

The advantage of working with ptib) is that b correlates 
only very weakly with halo redshift z^. Indeed, we found 
that the cross correlation coefficient between b and z^ 
in our mock catalogs was < 0.1. Moreover, we found 
the cross correlation between b and Nobs to be equally 
weak, so taking Pb(b) to be richness independent is a 
good approximation for the maxBCG cluster catalog at 
richness Nobs > 10 (the richness range that will be used 
for cosmological constraints). 

Figure [7] shows the distribution of bias parameters for 
each halo-cluster pair in Mock A. The distribution p{b) 
is seen to be well fit by a Gaussian, and is thus com- 
pletely characterized by the average bias parameter (6) 
and its standard deviation crt. The best-fit parameters 
for each of our mocks are (6) = 1.00, 1.02, and 1.03 and 
(76 0.04,0.03, and 0.05 for Mocks A, B, and C re- 
spectively. These determinations have effectively zero 
statistical error; here again systematic variations from 
realization to realization represent our main source of 
uncertainty. 

4. TESTING THE MODEL 

We now test whether our model can successfully repro- 
duce the observed number counts in the mock catalogs. 
Even more importantly, we test whether we can success- 
fully recover the cosmological and HOD parameters of 
the simulations with use of the likelihood function from 

t Throughout this section, we use the iJenkins et al.l 
)0l[ ) parameterization of the halo mass function. The 
linear power spectrum is computed using the l ow baryon 
transfer functions from lEisenstein fc Hul ()1999D with zero 
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Fig. 8. — Comparison between the cluster counts measured in the 
mocks and our model predictions. Solid, dotted, and dashed curves 
represent Mocks A, B, and C. For clarity, we have also displaced 
mocks C and B to the right by a factor of 2 and 4 respectively. 
Error bars on the model values are obtained from the diagonal 
terms of the correlation matrix, and are roughly uncorrclatcd. 

neutrino masses, and the initial power spectrum is as- 
sumed to be a Harrison- Zeldovich spectrum. Flatness is 
also assumed, and all cosmological parameters are held 
fixed except for erg, Q„n and h. Allowing other parame- 
ters to vary should have o nly a minor impact on our re- 
sults as it has been shown ([White et al.lll993tlRozo et al.l 
|2004|) that local halo abundances are most sensitive to 
these three parameters. 

4.1. Cluster Counts Comparison 

We begin by comparing the cluster number counts in 
each of our three mocks to our model predictions us- 
ing the simulation-calibrated values for all eight nuisance 
parameters (one completeness, two purity, two photo-z, 
and three signal matrix parameters). The input cosmol- 
ogy and HOD are taken directly from the mock catalog. 
There are, however, two important points concerning the 
mocks which we would like to highlight. First, the vari- 
ance in the number of galaxies in halos of a given mass 
is somewhat larger than Poisson. Consequently, in this 
section we take P{Nt\m) to be Gaussian, and calibrate 
the relation between Var(A^t|m) and {Nt\m) directly from 
the mocks. Secondly, halo masses in the simulation were 
defined at an overdensity A = 200 with respect to crit- 
ical, whereas our model requires masses to be measured 
at an overdensity of 200 relative to the mean matter den- 
sity. We transform t he masses ac cordingly using the fit- 
ting functions from iHu fc Kravt sov (2003). Our final 
uncertainty in the fitted values for the HOD is w 3% as 
estimated by examining the sensitivity of our best-fit pa- 
rameters to the number of bins used for the calibration 
and the minimum mass cut considered when fitting the 
HOD. We will see below that this accuracy is comparable 
to the statistical uncertainty with which we can recover 
the best-constrained modes in parameter space. 

Figure [8] shows the cluster number counts in each of 
our three mock catalogs as well as our model predictions. 
The error bars associated with the model arc simply the 
square root of the diagonal terms in the correlation ma- 
trix, and we have selected bins wide enough for the error 
bars to be roughly de-correlated. The agreement be- 



tween the model predictions and the observed number 
counts in the mocks is excellent. Note that this agree- 
ment is not trivial. While it is true that our mass func- 
tion is calibrated to the simulations, agreement between 
our prediction and the direct measurement in the mocks 
is only assured if our model successfully parameterizes 
the cluster-finding algorithm selection function. Figure 
[8] demonstrates that this is indeed the case, and that any 
systematics in the data have been properly taken into 
account. 

4.2. Parameter Constraints for a Known Selection 
Function 

We wish to test now whether we can successfully 
recover the cosmological and HOD parameters of the 
simulation with the use of the likelihood function con- 
structed in ^ To do so, we use a Monte Carlo Markov 
Chain (MCMC) method to evaluate the likelihood func- 
tion in parameter space and estimate the correspond- 
ing 68% and 95% confidence confidence contours of the 
likelihood function in parameter space. Details of our 
MCMC implementation, wh ich draws heavily on the 
work bv lDunklev erall(|2005h . can be found in Appendix 
[Cl Throughout this paper, we consider cluster num- 
ber counts binned in nine logarithmic bins between rich- 
ness Nobs = 10 and Nobs = 100 within a single redshift 
slice {[zmin, Zmax] = [0.12,0.25]. Wc chose this binning 
as a compromise between having enough bins to accu- 
rately resolve the shape of the cluster richness function, 
while at the same time ensuring that every bin contained 
> 10 clusters. This last property is desirable since the 
Gaussianity assumption of our likelihood function breaks 
down if the number of cluster within a given bin becomes 
too low. 

We begin our analyzes by estimating the likelihood 
function while holding all of our nuisance parameters 
fixed to the simulation- calibrated values. That is, we as- 
sume we have perfect knowledge of the cluster selection 
function. This is useful for two reasons: first, it allows us 
to test whether our model likelihood successfully recov- 
ers the input cosmology and HOD parameters when the 
cluster selection function, that is, the probability matrix 
P{Nobs\Nt), is fully cafibrated. In addition, investigating 
this case gives gives us a baseline for evaluating how well 
the signal matrix parameters must be calibrated before 
the quality of our parameter constraints decreases signif- 
icantly. We should also note that, by and large, holding 
the cluster selection function fixed is a standard assump- 
tion in most analyses of cluster abundances. One of the 
most powerful features of our method is that, as we shall 
see in §4.31 it allows us to marginalize over uncertainties 
in the selection function. 

As expected, wc find that there are strong degeneracies 
between between cosmology and HOD parameters. This 
is illustrated in Figure[9l where we plot the 68% and 95% 
confidence regions in the a — tJg plane for Mock A. All 
three mocks give similar results. The degenerate param- 
eter combination is roughly a^as = constan t, in agree- 
ment with the Fisher matrix estimate from iRozo et al.l 
(|2004[ ). Also shown in this fi gure with a small circle 
with error bars are the known value of the parameters 
in the simulation; the error bars represent the ~ 3% un- 
certainty in our direct measurement of the HOD param- 
eters. Clearly, to within the degeneracies intrinsic to the 
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Fig. 9. — Filled contours arc the 68% and 95% confidence regions 
in the a — ag plane recovered for Mock A when holding all nuisance 
parameters fixed. The true parameters are marked as a small cir- 
cle with error bars. We find that our likelihood model successfully 
recovers the simulation parameters to within the degeneracies in- 
trinsic to the data. Also shown with solid curves are the 68% and 
96% confidence regions obtained when using CMB and supernova 
like Gaussian priors AQmh'^ = 0.01 and Ah = 0.05. The dot- 
ted contours are obtained by marginalizing over all completeness, 
purity, and photo-2 parameters, and assuming 10% priors on the 
signal matrix parameters (see ^4.3l for discussion). The dashed line 
marks the expected degeneracy direction o^crg = constant, while 
the error bar centered on the true values of the simulation repre- 
sents the 3% error on a from the direct measurement of the HOD. 
Results for the three mocks considered were all very similar. 

method, wc successfully recover the input cosmology and 
HOD. 

In light of the strong degeneracies inherent to the data, 
we focus now on the directions in parameter space that 
are best constrained by the data. These are defined by 
diagonalizing the parameter correlation matrix as esti- 
mated from the MCMC output. The best-constrained 
modes are those for which the eigenvalues are smallest. 
In the case of the Mock A, the top two normal modes 
are 

XI =a°-9Vr'(^^™/Mi)"-35(f]„Mi)-"-"^/i-"-^^ (29) 
x,^a'-<^'a,'>-'%nm/A'hr'\nMr''h"-'-\ (30) 

Note the first parameter is essentially a cluster normal- 
ization condition, but with the Hubble and HOD param- 
eters included. Moreover, it clearly reflects the expected 
flm/Mi = constant degeneracy intrinsic to the halo mass 
function, though it is slightly modified due the weak sen- 
sitivity of the survey volume to flm- The second eigen- 
vector does not have a simple interpretation (though see 
Appendix in lRozo et al.l[200^ .^° Hereafter, we refer to 
the top normal model in parameter space as the gener- 
alized cluster normalization condition. 

We show the 68% and 95% confidence regions of these 
two parameters for Mock A in Figure [TUl Wc find that 
not only do we indeed recover the correct simulation pa- 
rameters, but that the associated statistical uncertainty 
is extremely small, of order 1% and 4% for the top two 
normal modes for our assumed survey of 1/8 of the sky 

Curiously, we note that the top normal mode does not contain 
the degeneracy direction a^crg alluded to earlier. Rather, both of 
the top two eigenmodes have considerable a and erg dependence, 
so the a^ag degeneracy is only recovered after marginalizing over 
all other parameters. 
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Fig. 10. — Filled contours are the 68% and 95% confidence 
regions in the a — erg plane for the the two best-constrained pa- 
rameter combinations. The circle with error bars marks the input 
simulation parameters, and the error bars are a 3% uncertainty 
from the the HOD fits to the simulation. Again, we find that our 
likelihood model successfully recovered the simulation parameters. 
Note that this is a very stringent test: the 1 — a error bars for 
these two top normal modes are 1% and 5% respectively. Also 
shown above with dotted curves are the 68% and 95% confidence 
regions obtained when using WMAP and supernova like Gaussian 
priors Af2| = 0.01 and Ah = 0.05. The apparent increase in the 
confidence regions is due to a slight rotation of the likelihood func- 
tion in parameter space due to the introduction these cosmological 
priors. 

and redshift ranges z G [0.13,0.25]. Given the small size 
of our error bars, the excellent agreement between our 
statistical analysis and the true simulation parameters 
is highly non-trivial. In particular, it explicitly demon- 
strates that if the selection function for the maxBCG 
catalog can he tightly constrained, optically-selected clus- 
ter samples can provide percent-level determinations of 
specific combinations of cosmological and HOD parame- 
ters. 

It is also worth investigating to what extent our con- 
straints can be improved upon through the use of other 
cosmological probes. In particular, t he CMB places 
strong constraints on flmh (see e .g. iHu et al.lTl997l: 
iHu fc DodelsM]|[200l iDodelsonI 120031) . while supernovae 
data puts strong con straints on the value o f the Hubble 
parameter h (see e.g. iFreedman et al.ll200ll ). The reason 
these two particular priors are interesting is that their 
values have minimal or no dependence on the dynamical 
nature of dark energy. That is, these constraints do not 
depend on whether dark energy is a cosmological con- 
stant or not. Consequently, employing these priors still 
allows us to use cluster abundances for studying the dark 
energy. Note that this is not the case for all priors. For 
instance, the CMB data can also provide priors on cg- 
provided the power spectrum at last scattering is extrap- 
olated to the present epoch using a ACDM cosmology. 
Clearly, such a prior is useless if one is interested in con- 
straining the behavior of the dark energy. Indeed, this 
is precisely why estimating erg is an interesting problem: 
deviations from the CMB interpolated value for erg could 
signal a failure of the ACDM model. 

To investigate the impact that CMB and supernova 
like priors can have on our results, we repeat the above 
analysis, but including now Gaussian priors of width 
Ailrnh^ ~ 0.01 and Ah ~ 0.05 centered on the simu- 
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lation cosmology. The width of these priors is set by 
the current uncertainty in each of the two cosmological 
parameters a s constrained by th e CMB and supernovae 
respectively (ISpergel et al.l [20061 ). We find that includ- 
ing these cosmological priors has minimal impact on how 
well our normal modes are constrained. This is shown in 
Figure [HI where we find that the — erg degeneracy 
is only marginally reduced. Even more telling is Figure 
[TUl where we show the confidence regions of the normal 
modes found in the no priors case. Note that the confi- 
dence regions slightly increase rather than decrease due 
to the rotation of the likelihood function in parameter 
space due to the introduction of the priors. Indeed, we 
find that including priors in our analysis results in not 
just two but three highly constrained modes at roughly 
1%, 2%, and 5% accuracy. The first and the third are 
almost identical to the normal modes found in the no 
priors case, and the ones shown in Figure [TOl The sec- 
ond normal mode, on the other hand, is largely parallel 
to the direction of our priors. 

In summary, we have found that local cluster abun- 
dances estimated from large surveys can provide percent- 
level constraints on combinations of cosmological and 
HOD parameters if the selection function, i.e. complete- 
ness, purity, and the signal matrix, is known precisely. 
Individual parameters cannot be constrained due to in- 
trinsic degeneracies in the data. Finally, adding cosmo- 
logical priors from CMB and/or supernovae has minimal 
impact on the best-constrained parameter combinations. 

4.3. Marginalizing Parameter Constraints Over 
Uncertainties in the Selection Function 

Consider now marginalization over uncertainties in the 
selection function. We found that the completeness, 
purity, and photometric redshift error parameters were 
well constrained from the simulations, and varying them 
through the range of values measured from the simu- 
lations had minimal impact on the estimated number 
counts. Indeed, upon adopting top hat priors corre- 
sponding to the 95% regions of these parameters we find 
that our results are largely identical to the ones presented 
in ij4.2l Consequently, henceforth every result we present 
is marginalized over the completeness, purity, and photo- 
z parameters. 

The signal matrix parameters, on the other hand, arc 
a different story. We saw in ii3.6l that the slope /3 of the 
mean relation between Nt and Nobs was relatively well 
constrained, and that a Gaussian prior on /3 of the form 
f3 = 1.18 ± 5% appears reasonable based on this set of 
simulations. Consequently, unless specially stated oth- 
erwise, we shall assume this prior in all of our analysis. 
We also saw, however, that the amplitudes Bq and Bi 
of the signal matrix had large systematic errors. In fact, 
these errors are large enough that attempts to constrain 
cosmology and HOD in the simulations using priors that 
covered the whole range of selection functions observed 
in the simulations proved unsuccessful. We have thus 
chosen to investigate how more moderate uncertainties 
in the amplitudes affect our results, with an eye towards 
future work which may improve our understanding of the 
cluster selection function. To do so, wc ran MCMCs with 
5%, 10%, 15%, and 20% priors on the signal matrix pa- 
rameters Bq and Bi using the observed number counts 
in Mock A. Due to the computational effort involved in 
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Fig. 11. — Sensitivity of the error of the best-constrained normal 
mode in parameter space to uncertainties in the cluster selection 
function. This mode corresponds to a generalized cluster normal- 
ization condition, and both its amplitude and direction are fairly 
robust for up to ^ 15% uncertainties in the cluster selection func- 
tion, characterized in this case by the amplitudes Bo and Bi (see 
i|3.6l l. Filled circles assume a 5% prior on the slope /3, while the 
triangle is obtained assuming a 10% prior on 13. Finally, the square 
marks the error on the generalized cluster normalization condition 
when including CMB and supernova like priors on Clmh^ and h 
(compare to Figure |9] The slight increase in the uncertainty is due 
to the change in orientation of the likelihood function. 

running MCMCs for each model we consider, we focus 
on Mock A only. There is no particular reason why this 
realization was chosen over the other two, and, based on 
our results from the previous section, wc have no reason 
to suspect that any one realization would lead to sub- 
stantially different results than the other two. 

Because of the large degeneracies in parameter space, 
we have chosen to focus on the two best-constrained 
modes in parameter space to quantify the sensitivity of 
our results to uncertainties in the values of Bq and Bi. 
We find that the best-constrained normal mode is robust 
to w 15% uncertainties in the selection function. In par- 
ticular, the direction of the mode remains constant, and 
the uncertainty in the parameter increases linearly with 
the width of the assumed priors, as illustrated in Figure 
[TTl By 20% uncertainties in the amplitudes Bq and Bi , 
however, the top mode has rotated away slightly, and its 
uncertainty starts growing faster than linearly with the 
width of the amplitude priors. Nevertheless, it is remark- 
able that even with uncertainties as large as 20% in the 
cluster selection function we can recover the top normal 
mode in parameter space to better than 5% accuracy. 

To investigate how sensitive our results were to our as- 
sumptions about P, we also considered the case in which 
all signal matrix parameters (including the slope (3) were 
known to 10% accuracy. The corresponding error on the 
top normal mode for this case is shown in Figure [TT] as 
a triangle, and demonstrates that there is little loss of 
information by the additional uncertainty in /3. It is 
worth noting that the two model parameters that are 
most closely aligned with the top normal mode are a 
and (Tg • Consequently, constraints in the a — as are rel- 

Marginalizing over the signal matrix parameters in our anal- 
ysis also gave rise to numerical difficulties in the realization of the 
MCMC. A description of these problems and how they were over- 
come is given in Appendix [C] 
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ativcly robust to uncertainties in the signal matrix, as 
shown in Figure [9l 

We now turn our attention to the behavior of the sec- 
ond best-constrained mode, which we find is not stable 
to uncertainties in the signal matrix. Specifically, wc find 
that there is a factor of two increase in the error of this 
mode in going from fixed nuisance parameters to 5% un- 
certainties in the signal matrix. Moreover, the direction 
of the second best-constrained normal mode is substan- 
tially different between the two cases. Curiously, as wc 
increased the width of our prior on the amplitudes Bq 
and Bi , we found that the second best-constrained mode 
remained relatively constant both in direction and width, 
suggesting the large difference between the fixed nuisance 
parameter case and the 5% priors case was driven largely 
by the uncertainties in the slope /3. We tested this sce- 
nario by running an additional chain with 10% priors 
on all signal matrix parameters, and found that, indeed, 
with the new priors for (3 the second best-constrained 
mode was severely affected, both in terms of the direc- 
tion and the percent-level accuracy with which it could 
be recovered. We conclude that there is a large degener- 
acy between cosmological and HOD parameters and the 
slope p. This is an important, though rather unfortu- 
nate, result, as it implies that to fully recover the con- 
straining power of large local cluster samples, the slope 
of the relation between Nobs and Nt must be known to 
high accuracy. 

In light of these results, it is worth returning to the 
question of whether or not CMB and supernova pri- 
ors on cosmological parameters could substantially al- 
ter our conclusions. To test this, we ran an addition 
MCMC using CMB and supernova like Gaussian priors 
Aflm/i^ = 0.01 and Ah ~ 0.05, assuming 15% uncertain- 
ties in Bq and Bi, and our default 5% level uncertainty 
in (3. We mark the corresponding uncertainty on the 
cluster normalization condition in Figure [Til as a square. 
Note that the error on the cluster normalization condi- 
tion slightly increases upon inclusion of the prior. This 
is again indicative of an overall distortion of the orien- 
tation of the likelihood surface in parameter space upon 
inclusion of the priors. Indeed, upon including the priors, 
we found that the best-constrained mode was no longer 
the cluster normalization condition, but rather falls close 
to the direction of the assumed priors. The generalized 
cluster normalization condition then becomes the sec- 
ond best-constrained eigenmode, and was itself slightly 
rotated relative to the fixed nuisance parameters case. 
The next best-constrained eigenmode was found to be 
unstable to the introduction of cosmological priors. 

5. SUMMARY AND DISCUSSION 

In this work we have introduced a general framework 
for characterizing the selection function of optical cluster 
finding algorithms. The fundamental assumption in our 
method is that the scatter in the mass-observable rela- 
tion for a cluster finding algorithm can be split into an 
intrinsic scatter, and an observable scatter due to the im- 
perfection of the cluster finding algorithm. We show that 
the inability to fully characterize catastrophic errors in 
richness assignments naturally gives rise to the concepts 
of purity and completeness in quantitative form. These 
definitions of purity and completeness are well defined 
and are particularly well suited to cosmological abun- 



dance analyses. 

This method could potentially be applied to character- 
izing the selection function and to cosmological param- 
eter estimation for a wide range of current and future 
cluster samples. Here, we have demonstrated its utility 
by application to the maxBCG cluster finding algorithm 
(|Koester et al.ll20"07aD , run on mock galaxy catalogs pro- 
duced using three different realizations of the ADDGALS 
prescription for connecting a realistic galaxy population 
to large dissipationless simulations, of c omparable vol- 
ume to the SDSS data sample ( detailed inlWechsler et al.l 
[2001 . In a companion paper (|Rozo et al.ll2007T ). we ap- 
ply this metho d to the SDSS maxBCG cluster catalog 
(jKoester et "alll2007bf ). 

By matching the input halos to the detected clusters, 
we have quantitatively calibrated the maxBCG selection 
function in each of the three mock catalogs, and demon- 
strated that with knowledge of this selection function 
we can accurately recover the underlying cosmology and 
HOD parameters of the simulations to within the intrin- 
sic degeneracies of the data. Moreover, we have shown 
that this is still the case when the selection function is 
only known to « 15% accuracy, though the uncertainty 
in the recovered parameters starts growing quickly after 
that. 

We conclude that it is possible to provide tight cos- 
mological and HOD constraints using optically-selected 
cluster catalogs, but doing so requires a better con- 
strained cluster selection function that we currently have. 
This is an important and non-trivial result: it explic- 
itly shows that the popular view that projection effects 
present an insurmountable obstacle for precision cos- 
mology with with optically-selected cluster catalogs is 
no longer the case. We have demonstrated that the 
maxBCG cluster catalog is highly complete and pure, 
and, more importantly, that any such effects can be 
incorporated into our cosmological parameter analysis 
through a detailed calibration of the cluster selection 
function. Provided the selection function is known with 
relative accuracy, optical cluster catalogs can be useful 
tools for precision cosmology. 

In the present work, we have not made an exhaustive 
attempt to characterize the uncertainties in the clus- 
ter selection function. Here, we investigated three re- 
alizations of the empirically-motivated galaxy biasing 
scheme ADDGALS, applied to one cosmological model, 
the large, low resolution Hubble Volume simulation. Al- 
though all three realizations provide a reasonable rep- 
resentation of galaxies in the local Universe, including 
realistic luminosity and color evolution and clustering 
properties, and a red sequence population that is a good 
match to maxBCG, the three simulations had different 
HOD descriptions. Our results suggest that the cluster 
selection function depends to some extent on the specific 
HOD of the simulation, at least for the richness measure- 
ments we considered. To mitigate this uncertainty in our 
analysis of the maxBCG data, in lRozo et al.l (|2007t l. we 
perform the analysis assuming only that the shape of the 
selection function is the same in both the simulations and 
the data, and greatly relaxing the prior on the slope /3 
relating the mean observed and intrinsic richness. 

We end now by considering the obvious question: can 
this situation improve? Wc remain optimistic that future 
work will allow tighter and more robust constraints on 
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the maxBCG selection function than are presented here, 
which wiU allow us to maximize the power of the large 
maxBCG data set. We are proceeding along three fronts: 

• Improved richness definitions. A robust calibration 
for P{Nohs\Nt) for arbitrary definitions of Nt and 
Nobs may indeed be hard to come by, it is entirely 
plausible that we can refine our definitions of halo 
and cluster richness to considerably improve our 
understanding of the selection function. For in- 
stance, in this work, no attempt was made to make 
Nohs an unbiased estimator of Nt, so something 
as simple as including a color cut in our defini- 
tion of Nt could significantly improve our model. 
If one can define iVobs such that, by construction, 
{Nobs\Nt) = Nt, not only will the number of nui- 
sance parameters immediately go down by two, but 
also some of the large degeneracies we uncovered in 
this work will become irrelevant. 

• A detailed characterization of the variance be- 
tween a range of models. A crucial question is 
whether the selection function calibration is robust 
to changes not only in the halo occupation of the 
galaxies but also in the cosmological parameters of 
the underlying simulation. Although we didn't ex- 
plore this directly in the mocks investigated here, 
we were generous in the range of galaxy popula- 
tions applied to the simulations. Scatter between 
selection function parameters will likely go down if 
we apply further observational constraints on the 
galaxy populations. We then intend a wide explo- 
ration of parameter space after these constraints 
have been applied. 

• The addition of mass calibration data from 
maxBCG itself. Information on the mass scale 
is directly ava ilable from both st acked lensing 
measurements (jSheldon et al.l l2007l Johnston et 
al, in preparation) , stacked X-ray measurements 
(|Rvkoff et al.l |2007[ ). and from the velocity dis- 
persi ons of the galaxies in clusters (|Becker et al.l 
|2007[ ). These data provide substantial additional 
constraints on combinations of our selection func- 
tion parameters, and will allow us to use weaker 
priors in both the selection function and cosmolog- 
ical parameter space. 
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APPENDIX 
SUMMARY OF EQUATIONS 

For reference, we summarize below all the formulae that quantitatively describe our model. Let N = {A^i, ...,Nm} 
be the number of clusters in richness bins 1 through M, and p be the parameters characterizing cosmology, HOD, 
purity, completeness, and the signal matrix Ps{Nobs\Nt)- The likelihood function £(p|N) is given by 

£(p|N) cc exp (~i(N - (N)) • C'^ ■ (N - (N))| . (Al) 



^(27r)A^det(C) 



Above, (N) is the expectation value for the data vector, and C is the correlation matrix of the observables. The 
number of clusters in a given richness bin a is given by 

{Na) = j dznd?nn {na) {^W) (e|n^) (A2) 

where x is the comoving distance to redshift Zh- The function {'^\zh) is the effective redshift selection function of the 
survey, and is given by 

{ip\zh) = J dzc p{zc\zh)f{zc) (A3) 

where if{zc) = 1 if the photometric cluster redshift Zc € [zmin, Zmax] and ^p{zc) = otherwise, and Zmm and Zmax 
define the redshift selection criteria for the survey. Likewise, the function {Q\hh) is the effective angular mask of the 
survey, and is given by 

(ein,.) / d^hMi^c)pin,\hh) (A4) 
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where 0(nc) is the angular mask of the survey. In this work, we have assumed p(hc\nh) — S{iic — n^). Finally, (ha) 
is the expected comoving cluster number density as a function of redshift, and is given by 

(na) = f dm^ ((zi)alv\m) . (A5) 
J dm 

where (dn/dm) is the halo mass function, and the quantity (apa/plm) represents the mass binning of the survey, given 

by 

{c^Pa/p\m) = c{Nt)MNt)P{Nt\m). (A6) 

Nt 

where c{Nt) is the completeness function, and P{Nt\m) is the HOD. tpa is the binning function in terms of Nt, which 
is related to the top hat richness binning function ipaiNots) via 

MNt) = E %#^^.(^o..|iVO. (A7) 

where the sum is over all Nobs values within the signal band, p{Nobs) is the purity function, and Ps{Nobs\Nt) is the 
signal matrix. 

We also summarize below the equations describing the correlation matrix C. In particular, the Poisson contribution 
to the correlation matrix takes on the form 

iCa,a')p = Sa.a' J (x' ^^dz,Jh,,) J dm^ {c^a/p'\m) {Q\hh) {^{z^) . (A8) 

Note that, schematically, this contribution takes on the form Ca.a' ~ Sa^a' (Na) /p- This is exactly what we would 
expect: if TV is the number of observed clusters and Ng the number of signal clusters, then N = Ng/p, so Var(iV) = 
Var(A^s)/p^ = N/p. To this Poisson contribution we must also add the sample variance term 

{Ca,a')s = {bGNa) {bGNa') a\v) (A9) 

where 



{bGNa) - V 



dm b(m) | ^ (ci/'a/pl'Ti) G 
dm 



(AlO) 



V is the survey volume, and u'^iV) is the rms variance of the linear density field over the sample volume probed. 
An additional contribution to the correlation matrix arises from the stochasticity of the mass-richness relation, which 
leads to uncertainties in the precise mass binning of the cluster sample. This contribution takes on the form 



iCa,a')B - / dzhdW {v\zh) (e|n,) {^\zh) (All) 

dzh 

dm ^ [5a,a' {cipa/p^\m) - {ciPa/p\m) {cTPa'/p\m)] . (A12) 

Note that in the above expression it is evident that neighboring bins are always negatively correlated, reflecting the 
fact that halos scattering into a given bin a must have scattered out of some other bin a' . Also, there is an additional 
contribution to do photometric redshift uncertainties, which reduces to 

{Ca.a')z = 5a,a' I dzud^hu ^ {Q\nh) \{v\zh) - {^\zhf] f dm ^ {c^Pa/p^\m) . (A13) 



dz 



h 



dm 



Finally, there are uncertainties due to the stochastic nature of the purity function, which take on the form 

dzh— {(p\z) / dm-j^ (cV'a(l - p)/p\m) . (AM) 
dzh J dm 

Note that when the purity is exactly equal to one, the purity contribution to the correlation matrix vanishes, as it 
should. 

MATCHING ALGORITHMS GONSIDERED IN THIS WORK 

The type of membership matching algorithms we considered can be broadly grouped into one of two categories: 
deterministic matching algorithms and probabilistic matching algorithms. We considered four deterministic matching 
algorithms: 

1. Maximum shared membership matching: Each halo is matched to the cluster containing the largest number of 
galaxies belonging to the halo. 

2. BCG matching: A halo is matched to the cluster containing the halo's BCG. 
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3. Exclusive maximum shared membership matching: Halos are first rank ordered according to their richness Nf. 
Starting with the richest halo, its cluster match is found through maximum shared membership matching. The 
matched cluster is then removed from the list of candidate matches for all other halos, and the procedure is 
iterated. 

4. Exclusive BCG matching: This is exactly analogous to exclusive maximum shared membership, only BCG 
matching is used to match halos to clusters at each step. 

In addition to these four matchings, we considered what we have called probabilistic matching schemes, which are 
essentially generalizations of the deterministic matching schemes. The idea is as follows: imagine listing all halos and 
all clusters in a two column format, so that the left column contains all halos and the right column contains all clusters. 
Pick a cluster a, and draw a line connecting halo a to all clusters /3 which share members with halo a. Let then fafs be 
the fraction of galaxies in halo a contained in cluster (3. Maximum shared membership matching consists of selecting 
the cluster /3 which maximizes fajs with probability one. More generally, however, one could imagine replacing this 
probability with some other probability function pifap), an obvious choice being p{fa(3) = fap which we refer to as 
proportional random matching. Another possible choice is random membership matching, where we set p{faf3) = ^/Na 
where Na if the total number of candidate cluster matches for halo a. 

Consider now the probability matrix P{Nobs\Nt) in the case of probabilistic matching. One has that 

P{Nobs\Nt) =Y,PiNobs\a)Pia\Nt) (Bl) 

a 

where P{Nobs\o.) is the probability that halo a be matched to a cluster with Nobs galaxies, and P{a\Nt) = 6^^ „^ /N{Nt) 
is the probability of picking halo a at random from a the set of N(Nt ) halos of richness Nt. All that remains is computing 
the probability P{Nobs\c(), which is simply given by 

PiNobsla) =J2^Rf^^N^^sPiU)- (B2) 
P 

Putting it altogether we find 

P{Nobs\Nt) = — ^ J2 SR.,NM,,N^.sPiM- (B3) 

Figure [1] plots the cost function A{Nobs) for each of the two exclusive matching algorithms, and for the non- 
exclusive maximum shared membership matching in the case of the s simulation. The corresponding plots for the 
other simulations are quite similar. The non-exclusive BCG matching and the probabilistic matchings give results 
almost identical to the non-exclusive maximum shared membership matching case, and are very much worse than any 
of the exclusive matching algorithms, demonstrating that enforcing a one to one matching between halos and clusters is 
of paramount importance. It is worth noting that maxBCG does not enforce exclusive galaxy membership for cluster, 
that is, a galaxy can be a member of more than one cluster. 

We focus now on the exclusive matching algorithms. In particular, each of these algorithms has an upturn in the cost 
function at high richness. This upturn is due to what we called noise in fJ21 i-c catastrophic errors in the cluster-finding 
algorithm. These catastrophic errors are in reality quite rare, so the mixing induced by non-zero matrix elements due 
to catastrophic errors generate an unrealistically large amount of mixing between low richness halos and high richness 
clusters. Consequently, we believe the upturn at high richness is unphysical. Given this assumption, based on Figure 
[1] we chose exclusive maximum shared membership matching as our fiducial halo-cluster matching algorithm for the 
remainder of this work.^^ 

MCMC IMPLEMENTATION 

Evaluation of the likelihood function in a multi-dimensional parameter space can be extremely time consuming. 
Fortunately, not all of parameter space needs to be sampled, only the high likelihood regions. We achieve th is through 
the use of Monte Carlo Markov Chains (MCMC) , drawing heavily on the work by iDunklev et all (|2005l ) (for some 
introductory material on MCMC methods see e.g. IWasserman|[2004|) . 

In this work, we r equire not only to fin d the point of maximum likelihood, but also the 68% and 95% probability 
likelihood contours. IDunklev et al.l ()2005D showed that the number of points in a chain necessary to recover a p 
confidence region with accuracy q is iV « 3.3D/q^/{l — p) where D is the dimensionality of the parameter space. 
When we hold our nuisance parameters fixed to the simulation calibrated values, the total number of free parameters 
is five, so the number of points in the chain necessary to recover the 95% confidence regions with 10% accuracy is 
N ^ 3 ■ 10^. Wc choose N = 5 ■ 10"' steps as the default length of our chains in this case. When fitting for all 13 

The fact that A is lower at high richness for exclusive BCG matching suggests that one try a hybrid matching algorithm for which 
exclusive BCG matching is used at high richness, and exclusive maximum shared membership matching is used at low richness. The 
transition richness Nt between these two matchings would then be a tunable parameter. We found that requiring this hybrid algorithm to 
perform comparably to the exclusive maximum shared membership algorithm in the low richness regime resulted in a transition richness 
Nt high enough that the hybrid algorithm became effectively identical to the exclusive maximum shared membership algorithm. 
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parameters (cosmology+HOD+nuisance). the expected minimum number of points required to appropriately sample 
the likelihood function is thus N « 10^, which we adopt as our default length for the chains. 

Now, in order to achieve convergence quickly, some take must be taken to ensure that the chain is close to optimal. 
In particular, b oth t he direction and size of the steps in the MCMC need to be carefully chosen. Here, we follow 
IDunklev et al.l (|2005( l. and take random steps along the principal components of the parameter correlation matrix. 
For each normal mode, we chose a step size la j^fD where a is the eigenvalue of the principal component and D is 
the number of parameters (either 4 or w 10 in our case). This step size i s sligh tly smaller than the value reported in 
IDunklev et al.l (j2005[ ). and is chosen because, as shown bv lDunklev et all (j2005f ). for problems with a large number of 
parameters, it is in general better to err on the low side when estimating the optimal step size. 

Finally, in order to be able to take steps along the principal components of the correlation matrix, we must first 
estimate the correlation matrix. To do so, we follow an iterative procedure where we start our MCMC in a region 
of relatively high likelihood, and use the first 1000 steps to estimate the correlation matrix varying only cosmological 
and HOD parameters to resolve any intrinsic degeneracies of the model. For the next 2000 steps we introduce the 
parameters characterizing the signal matrix as additional degrees of freedom, and estimate the new correlation matrix. 
We then allow the completeness, purity, and photo-z's parameters to vary, and use 3000 steps to estimate the correlation 
matrix. We make one final iteration of 4000 steps and re-estimate the correlations matrix, which is used to take an 
additional 10^ steps which constitute our chain for the purposes of parameter estimation. When keeping nuisance 
parameters fixed, we use only two training runs to estimate the correlation matrix, the first with 1000 steps, an d then 
an ite ration with 2000 steps. We checked all our chains for convergence with the tests described in Dunklcv et al.l 

We found that letting the signal matrix parameters float in our MCMCs introduced a very serious numerical difficulty. 
In particular, we found that in our model there was a small region of the 13 dimensional space where the estimated 
correlation matrix of the cluster number counts becomes singular, causing the likelihood function to blow up. We 
found this difficulty arose due to detailed numerical cancellation between the Poisson and binning contributions to 
the correlation matrix. Consequently, we decided to approximate the correlation matrix by replacing the Poisson 
contribution estimated from our model by XjNa where Ng. are the observed number counts. In high likelihood regions, 
one has (Na) ~ Na and hence the approximation should be valid. Indeed, we explicitly checked that we could reproduce 
all of our results when keeping the signal matrix parameters fixed while employing this approximation. 

The end result of our approximation was to greatly reduce the volume of parameter space where the cluster counts 
correlation matrix became singular. A typical output from a chain computed with the above likelihood is shown in 
Figure 1121 The fact the singular region in parameter space is small can be seen from the fact that the chain ran 
for w 9 • 10^ steps before encountering the singularity. Moreover, once the chain stepped into this region, it froze, 
demonstrating that virtually every step took it outside the singularity. To avoid this behavior, we can simply carve 
out the singular region of parameter space, which we do by introducing a cut on the likelihood. In particular, we set 
C = for any region where our model results in an unrealistically large likelihoods. It is worth noting that introducing 
such a cut without affecting the performance of the chain is not difficult. For instance, in Figure \Vll we see that the 
likelihood ratio between the singular region and the physical region is w 10^^. 
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Fig. 12. — An MCMC output illustrating the singularity of the correlation matrix in a small region of parameter space. If the estimated 
correlation matrix of the cluster number counts becomes singular, the likelihood function is infinite. To avoid the corresponding small 
problem region in parameter space we introduce a likelihood cut. See text for more discussion. 



